DOCUMENT RESUME 



ED 429 998 



TM 029 708 



AUTHOR 

TITLE 

PUB DATE 
NOTE 



PUB TYPE 
EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Dawadi, Bhaskar R. 

Robustness of the Polytomous IRT Model to Violations of the 
Unidimensionality Assumption. 

1999-04-00 

50p.; Paper presented at the Annual Meeting of the American 
Educational Research Association (Montreal, Quebec, Canada, 
April 19-23, 1999). 

Reports - Evaluative (142) -- Speeches/Meeting Papers (150) 

MF01/PC02 Plus Postage. 

*Ability; Estimation (Mathematics) ; Factor Analysis; *Item 
Response Theory; Models; *Robustness (Statistics) ; 

Simulation 

*Polytomous Variables; Unidimensionality (Tests) ; *Violation 
of Assumptions 



ABSTRACT 



The robustness of the polytomous Item Response Theory (IRT) 
model to violations of the unidimensionality assumption was studied. A 
secondary purpose was to provide guidelines to practitioners to help in 
deciding whether to use an IRT model to analyze their data. In a simulation 
study, the unidimensionality assumption was deliberately violated by using 
two-dimensional data. The "impact,” or change in the error due to violating 
the assumption, was calculated to assess the effects of the violation on 
ability estimation. The effects of problematic variables on absolute impacts 
and their interactions were analyzed, and a factor analysis using the 
principal component method was conducted to provide guidelines to 
practitioners on the computer-generated generated data. The precision of the 
estimated ability was determined in four ways. When the ability estimate was 
assumed to measure the average ability and the major ability of two unequally 
important abilities, the procedure was generally robust to the violation. 
However, when the ability estimate was assumed to measure one of the two 
equally important abilities and the minor ability of two unequally important 
abilities, the estimation procedure was not robust. Results from the analysis 
of variance were consistent with the results from analyzing the relative 
impacts. (Contains 5 tables and 47 references.) (Author/SLD) 



★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★*★★★★*★★★★****************************** 

* Reproductions supplied by EDRS are the best that can be made 

* from the original document . 



TM029708 ED 429 998 



Robustness of the Polytomous IRT Model to Violations of the 
Unidimensionality Assumption 



Bhaskar R. Dawadi 
Georgia Examining Boards 



U S DEPARTMENT OF EDUCATION 
Office of Educational Research and Improvement 
EDUCATIONAL RESOURCES INFORMATION 
CENTER (ERIC) 

01 his document has been reproduced as 
received from the person or organization 
originating it. 

□ Minor changes have been made to 

improve reproduction quality. 



• Points of view or opinions stated in this 
document do not necessarily represent 
official OERI position or policy. 



PERMISSION TO REPRODUCE AND 
DISSEMINATE THIS MATERIAL 
HAS BEEN GRANTED BY . , 

^Wx^YsOX' 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 



A paper presented at the annual meeting of AERA 
Montreal, Canada, April 1999 



BEST COPY AVAILABLE 




2 



Abstract 



The primary purpose of this study was to investigate the robustness of the polytomous 
Item Response Theory (IRT) model to violations of the unidimensionality assumption. The 
secondary purpose was to provide guidelines to practitioners to help in deciding whether to use 
an IRT model to analyze their data. 

The unidimensionality assumption was deliberately violated by using two-dimensional 
data. The "Impact," or change in the error due to violating the assumption, was calculated to 
assess the effects of the violation on ability estimation. Problematic variables were identified 
using Relative Impacts, and the effects of those variables on Absolute Impacts and their 
interactions were analyzed using a 2^ factorial ANOVA. A factor analysis using the principle 
component (PC) method was conducted to provide guidelines to practitioners on the computer 
generated two-dimensional data. 

The precision of estimated ability was determined in four ways by comparing the 
estimated ability: (1) with the average of two true abilities, (2) with one of the two equally 
important abilities, (3) with the major ability of two unequally important abilities, and (4) with 
the minor ability of two unequally important abilities. 

When the ability estimate was assumed to measure the average ability (1) and the major 
ability of two unequally important abilities (3), the procedure was generally robust to the 
violation. However, when the ability estimate was assumed to measure one of the two equally 
important abilities (2) and the minor ability of two unequally important abilities (4), the 
estimation procedure was not robust. Results from ANOVA analysis were consistent with the 
results obtained from analyzing the Relative Impacts. 



2 




3 



Introduction: 



The demand for performance assessment is increasing in the field of educational 
measurement (Crocker, 1995). California has replaced the paper and pencil test with a new 
statewide examination using performance assessment ("California's New Academic Assessment 
System," 1996). Performance assessments allow scorers and assessors to evaluate skills achieved 
by students, skills that cannot be measured by traditional modes of evaluation such as a paper and 
pencil test (Oosterhof, 1990). A rating scale is commonly used to record observations in 
performance assessment. A polytomous item response theory (IRT) model is one of the 
appropriate tools for analyzing observations recorded by the rating scale (Andrich, 1978a and 
1978b). IRT models have been used by test developers and measurement specialists in various 
applications, such as in customized testing, criterion-referenced testing, and national assessment 
(Hambleton, 1989). 

Polytomous IRT models are based on a set of strong assumptions. The assumptions are 
necessary for the integrity of the models and these assumptions help to bring the mathematical 
complexity of the model within reasonable bounds. Unidimensionality is one of the assumptions 
of polytomous IRT models. The assumption of unidimensionality states that only one ability or 
trait is necessary to "explain" or "account" for an examinee's test performance (Hambleton «& 
Swaminathan, 1985). However, in practice, data obtained from educational achievement tests do 
not always satisfy the unidimensionality assumption of IRT models (Traub, 1983). Constructs, 
even like vocabulary ability, are multidimensional if analyzed in enough detail (Reckase, 
Ackerman, and Carlson, 1988). Many studies have shown that when the unidimensionality 
assumption of the dichotomous IRT models was violated, the results obtained from such analyses 



3 




4 



were not valid (e.g., Folk & Green, 1989; Oshima and Miller, 1990; Dorans & Kingston, 1985; 
Downing and Haladyna, 1 996). 

Studies have been conducted to detect the effect of various degrees of violation of the 
unidimensionality assumption for the dichotomous IRT model (e.g., Reckase, 1979; Drasgow 
and Parson, 1987; Harrison, 1988; Zeng, 1989; Dirir and Sinclair, 1996). However, studies 
indicating the robustness of a unidimensional polytomous model when the test is 
multidimensional are rare thus far. To date and to the best of my knowledge, only DeAyala 
(1995), using computer simulated multidimensional data, has explored the systematic effect of 
multidimensionality on the estimation in the Master’s partial credit model (Master 1982). This 
model assumes that all items are equally effective at discriminating among examinees. 

Parameter estimates were obtained in the study using the computer program MSTEPS (Wright, 
Congdon, and Schultz, 1 989). The study was conducted using the compensatory and non- 
compensatory multidimensional data. 

In a compensatory model with multidimensional data, the author found that the estimated 
ability (8) was a better estimate of the mean (0 ) of two abilities 0, and 02 than either individual 
ability 0, or 02. As the multidimensionality of the data decreased, the "differences in RMSE and 
bias with respect to 0,, 02, and mean 0 diminished (p. 413)." In a noncompensatory model, when 
the data were multidimensional, the accuracy of estimated ability (^) was consistently greater for 
mean ability (0) than for either 0, or 02 . However, when the test was divided into two individual 
dimensions, the Root Mean Square Error (RMSE) corresponding to one of the examinee’s 
abilities was lower than the RMSE with respect to the mean ability (0). As the correlations 
between two abilities (0, and 02) increased, the RMSE decreased; however, one ability (0,) was 

4 




5 



better estimated than the other ability (Oj) throughout the 6 continuum. 

Purpose : 

The primary purpose of this study was to investigate the robustness of the 
unidimensional polytomous IRT model when the data analyzed by the model were 
multidimensional. Data were generated to fit the generalized logistic partial credit model 
(GPCM) (Muraki, 1992a). The GPCM is derived by adding the a parameter (slope) in the 
equation for Masters' partial credit model. Thus, in this GPC model each item had different 
discriminating capability. 

The secondary purpose of this study was to provide a general guideline to practitioners to 
help in deciding whether using an IRT model to analyze their data would be appropriate or not. 
In a simulation study like this, all results are obtained in terms of true parameters. However, in 
practice, true parameters of the data are not known to practitioners. Thus, a factor analysis using 
the principle component method (PCA) was used to generate such guidelines. 

Method : 

The study was conducted using computer generated data. The sample size was fixed to 
1000 for all simulations in this study. In order to control the complexity of this study, only two 
dimensions 6 , and 62 were used while generating item response data, i.e., the study included 
cases with violations of the unidimensionality assumption due to two underlying abilities ( 6 , and 
62 ). Variables (or parameters) and their two levels systematically varied in the study are listed 
below in Table 1. 



InsertTable 1 about here 



Out of these six variables which were systematically varied during the study, all variables 
except Dimensional Strength were self explanatory and commonly used in this type of simulation 
study. In this study the variable Dimensional Strength was used to show the relative importance 
or strength of dimensions on a test, and it was done by increasing or decreasing the number of 
items representing each dimension. In a test with 8 items, if six items represent dimension one 
(0,) and two items represent dimension two (Oj), common sense tells us that the Dimensional 
Strength (or the relative importance) of ability 0, is greater than that for the ability 0j. Two 
levels of dimensional strength expressed by the proportion of items that represented each 
dimension were 50/50 and 25/75 percentages. On the 50/50 percentage strength level, each 
dimension was equally represented, whereas on the 25/75 percentage strength level, 25 percent of 
the total number of items in a test represented dimension-one and the remaining 75 percent of 
items represented dimension two. 

Root Mean Square Error (RMSE) was calculated using true and estimated abilities for 
both unidimensional and two-dimensional data, separately. RMSE was calculated using the 
following formula: 



RMSE=, 

\ 






true'' 



i = \ 



n 



where, x^j, = individually estimated ability (0) by PARSCALE for both uni and two- 
dimensional data 

0,rue = true ability which is 0„ 0j, and the average (0 ) of 0, and 0j for 
multidimensional data and simply 0 for unidimensional data 



n = 1000, number of individuals taking the test 



Data Generation : 

Both unidimensional and two-dimensional data were generated using the RESGEN 
program (Muraki, 1996). Abilities in the unidimensional data were generated independently of 
the abilities from the two-dimensional data, i.e., unidimensional 0 was not the same as 0i or 0j of 
two-dimensional data. However, abilities for both two-dimensional and unidimensional data 
were generated under the same condition. 

The effect of a violation of the assumption of unidimensionality was assessed using true 
and estimated abilities. Estimation of abilities came from PARSCALE (Muraki and Bock, 

1993). RMSE on both multidimensional and unidimensional abilities was calculated. For the 
unidimensional data, there was only one RMSE calculated using a unidimensional true ability (0) 
and estimated ability (8). For the two-dimensional data, three different RMSEs were calculated 
for each set of conditions using the single estimated ability (0) and true ability 0,; 0 and true 
ability 0j; and 8 and the mean ability (0), which was the average of true ability 0, and true 
ability 0j, individually. PARSCALE did not provide the estimate of the average ability or 
individual ability. Only one ability estimate was obtained from PARSCALE for the two- 
dimensional data. For the simplicity of discussion in this study, the ability estimate obtained from 
PARSCALE was assumed to measure the average of the two true abilities and two individual 
abilities. 

Analysis 

Identifying Problematic Conditions : 

7 



O 

ERIC 



8 



The Impact of violation represented by Y was assessed by calculating the difference 
between the RMSE of the two-dimensional data (RMSE 2 d) and the RMSE of the unidimensional 
data (RMSEjd) , i.e., Y = RMSE 2 D - RMSE,d. If the Impact (or Y) were minimal, the model 
could be said to be robust. If the Impact were large, the model could be said to be not robust. 
During the study, Y was represented by IMPCT_AV, IMPCT_1 and IMPCT_2 for the average 
ability (0), ability one (0,) and ability two (02), respectively. For analysis purposes, the Impact 
was defined in two ways: Absolute Impact and Relative Impact. 

The Relative Impact was the percentage increase of root mean square error of two- 
dimensional (RMSE 2 d) data with respect to the root mean square error of unidimensional data 
(RMSE,d). It was calculated, for example for IMPCT_AV, by dividing the mean of IMPCT_AV 
by RMSE,d and multiplying that dividend by 100. The RMSE for both unidimensional and two- 
dimensional data was calculated with 1000 subjects. The sample size of 1000 was enough to 
bring the variation due to chance between runs to a negligible minimum. 

To give readers a sense of the practical importance of these Impacts, a threshold value of 
Relative Impacts was selected. It was assumed that a Relative Impact higher than 20 percent was 
practically important in this study. That is, an error of 20 percent or larger was regarded as a 
considerable amount of error for the purpose of identifying conditions in this study that were 
problematic to the violation. Any treatment condition that yielded a Relative Impact lower than 
20 percent would be considered robust to violation of the unidimensionality assumption. For the 
discussion purpose, the treatment conditions based on the threshold value of Relative Impacts 
were further classified into three criteria: treatment conditions that were generally robust, 
generally not robust, and most problematic to violation of the assumption. Using Impact as the 



outcome allowed the researcher to answer the following two basic questions: 1) Under what 
conditions would the violation become problematic? And 2) What were the effects of each of the 
study variables on Absolute Impact ? 

Developing Practice Guidelines Using the PCA Results : 

To assist practitioners in using the IRT models for analyzing their data, results from the 
IRT analysis were compared with the results from the factor analysis. The guidelines were based 
on what an analyst would reasonably know about the data based on a simple factor analysis. 

From the results of the factor analysis, practitioners would know the number of existing factors 
in their data (i.e., dimension of ability), the correlations between those factors (if there were more 
than one factor), and the dimensional strength (or relative importance) of their data. The 
dimensional strength would be known by looking at the loadings of factors on each item. 

Analysts would also know from the outset the number of response categories of each test item 
and the length of test used. Thus, only those variables that were easily known to practitioners (the 
number of factors (i.e., dimension of ability), the correlations between factors (COR), the 
dimensional strength (DS), the number of response categories (CAT) and test length (TL)) were 
used to provide guidelines to practitioners in deciding for themselves whether to use an IRT 
model for data analysis, or not. Item Category Threshold Range (THR) and Item Slopes (SLP) 
were two variables out of six that were systematically varied in this study. However, those 
variables were not available to practitioners from the results of factor analysis because THR and 
SLP are specific to IRT analysis. Thus, THR and SLP were excluded when guidelines were 
developed for practitioners. 

While developing the guidelines, first, the number of factors were determined for all 



treatment conditions. Second, Relative Impacts (already calculated) of the treatment conditions 
were compared with the threshold of 20 percent to determine the problematic conditions. Third, 
based on the correlations obtained from the factor analysis, guidelines to practitioners were 
developed. When only one factor was extracted, there was no correlation; thus, conditions other 
than correlation such as, the number of reponse categories and the dimensional strength, were 
applied to develop the guidelines. The reliability of these guidelines was judged by calculating 
the success rate (how many times the results obtained by using the guidelines were correct). 
Effects of Study Variables on Impacts : 

A 2^ factorial design was used to test the significant effects of six variables and their 
interactions on Impacts. The Absolute Impact was used as the dependent variable in the analysis 
of variance (ANOVA). With six variables in the design and each factor having two levels, the 
analysis had 64 treatment conditions, and each run was replicated twice. This replication would 
provide variability within each cell. In the ANOVA analysis, the statistical significance of all 
main and interaction effects were tested, and the highest order interactions were identified. The 
practical importance of these higher order interactions was determined using the partial eta- 
square. Partial eta-square was represented by the following expression (Norusis/SPSS Inc, 

1993). 



sums. of. squares. for. the. effect. of. interest 
{sums. of. squares.for. the. effect. of. interst)+(sums. of. squares. for. errors. effect) 



For the purpose of this study, it was assumed that interactions among variables would be of 
practical importance if the partial eta-square was roughly 0.15 or greater. Adequacy of this cut- 

10 

er|c 



11 



off value for partial-eta square was supported by the large variation of simple main effects of 
each variable. Once practically important higher order interactions were identified, simple main 
effects of those factors at specific levels of other variables were calculated. For a detail 
discussion of these simple main effects of those factors see Dawadi, 1998. 

Results : 

This result section was organized in the following way: first, results from the analysis of 
Average Ability (0) were discussed, and then results from the analysis of individual ability 
(Ability 0, and Ability 0j) were discussed. Analysis of individual ability was further divided by 
the variable dimensional strength: one of two equally important abilities (DS=50/50), major 
ability of two unequally important abilities (ability represented by 75 percent of the total number 
of items), and minor ability of two unequally important abilities (ability represented by 25 
percent of the total number of items). Relative Impacts were used for identifying problematic 
treatment conditions and for developing practice guidelines. Absolute Impacts were used in the 
ANOVA procedure for identifying the statistically significant contrasts of each of the study 
variables. 

Identifying Problematic Conditions while Estimating Average Ability : 

Means (mean^^). Standard Deviations (std dev), and Relative Impact (percent^v) of 
average ability (IMPCT_AV) and root mean square error of unidimensional data (RMSE,d) for 
the 64 treatment conditions are presented in Table 2. Also presented in the table are the number 
of factors and their correlations from the factor analysis. 




Insert Table 2 about here 
11 



12 



As presented in Table 2, the mean value, which was the absolute difference between the RMSE 
of unidimensional and two-dimensional data of IMPCT_AV for the 64 treatment conditions, 
ranged from a low of 0.001 to a high of 0.367. The standard deviations of IMPCT_AV ranged 
from 0.001 to 0.06, showing a small variation within specific combinations of conditions. 

From analyzing Relative Impacts, it was obvious that the correlation was an important 
variable for classifying the robust and non-robust treatment conditions of this study. When the 
correlation (COR) was 0.8, the results were generally robust (Relative Impact < 20 percent) 
regardless of the levels of other variables used. However, when the correlation was 0.3, the 
results were robust only when the number categories (CAT) was 3 and the dimensional strength 
(DS) was 50/50. Forty treatment conditions out of the total of 64 were counted by selecting the 
following treatment conditions: 1) all treatment conditions with correlations of 0.8 and 2) 
treatment conditions whose correlations were 0.3, number of response categories were 3, and 
dimensional strength were 50/50. The relative impacts of 34 out of these 40 treatment conditions 
(85 percent accuracy rate) were not problematic (impact < 20 percent) to the violation when the 
estimated ability was compared with the true average ability. 

For those treatment conditions with correlations of 0.3, the procedure was not robust 
when the number of response categories was 5 or 3 and the dimensional strength was 25/75. 
Twenty-four out of 64 treatment conditions fell under those conditions, and the Relative Impacts 
of those 20 out of 24 treatment conditions were greater than the threshold of 20 percent ( 83 
percent accurate in detecting problematic treatment conditions). The most problematic 
conditions were identified when the correlation was 0.3, the number of response categories was 



5, and the dimensional strength was 25/75. Within these treatment conditions, the Relative 
Impact ranged from 36.42 to 1 16.7 percent, a big impact due to the violation of the assumption. 

Proposed Guidelines while Estimating Average Ability . Abiding by the rules 
described earlier, the following guidelines were developed. When estimating the average ability, 
practitioners can assume the IRT procedure to be robust to the violation if a single factor solution 
was obtained from the factor analysis. Thirteen out of 14 treatment conditions (93 out of 100 
times) were robust to the violation when only one factor was extracted by the factor analysis. 

When two factors were extracted by the factor analysis, the correlation between those 
factors was evaluated. It was revealed that when the correlation between factors obtained by the 
factor analysis was greater than 0.4, the treatment conditions were not problematic to the 
violation. Under this condition, 15 out of 18 treatment conditions (83 percent correct decision) 
were not problematic to the violation. But, when the correlation was smaller than 0.4, the 
number of response categories and the dimensional strength were critical to the robustness of the 
procedure. When the correlation was smaller than 0.4, but the number of response categories 
was 3 and the dimensional strength was 50/50, six out of eight treatment conditions (75 percent 
correct decision) were not problematic to the violation. All other remaining treatment conditions 
(24 out of 64 conditions) were problematic to the violation. These 24 treatment conditions were 
comprised of DS=25/75, and CAT= 5 and 3. For these treatment conditions, when the 
correlation was smaller than 0.4, twenty out of 24 conditions (83 percent correct decision) were 
problematic to the violation. 

Effects of Study Variables while Estimating the Average Ability . The highest order 
significant interaction was a 5-way interaction, CAT x COR x DS x SEP x THR (p=.001), and it 



was also practically important. The contrasts of each variable at each level of the other variables 
were calculated aggregating over the variable (TL). The variable TL was involved only in two of 
the two-way interactions: COR x TL and DS x TL; however, the effect of the TL on Impact was 
not practically important. The effects of each of the study variables obtained from the ANOVA 
analysis of Absolute Impacts (IMPCT_AV) were ranked according to the magnitude of their 
contrasts in the Table 3. For practical purposes it was assumed that any simple main effects 
contrast of Impacts greater than 0. 1 would be of practical importance. 



Insert Table 3 about here 



From the table it can be seen that, when estimating the average ability, CAT(0.316) had 
the largest effect on Impact followed by COR(-0.263), DS(-0.182), THR(-0.171) and SLP 
(0.157). The positive value of contrasts of CAT and SLP indicated that the effect of CAT and 
SLP on Impact was larger when the number of response categories was 5 and when the item 
slope was 1 .0 compared to when the number of response categories was 3, and when the item 
slope was 0.5, respectively. The negative value of contrasts of COR, DS, and THR indicated that 
the effect of COR, DS, and THR on Impact was larger when the correlation between two abilities 
was 0.3, when the dimensional strength was 25/75, and when the range of item category 
thresholds was -1 to + 1 compared to when the correlation was 0.5, when the dimensional 
strength was 50/50, and when the range of item category thresholds was -2 to +2, respectively. 

For the average ability, the effects of COR, DS, THR and SLP on Impacts were large 
when the number of response categories was 5. Similarly, the effect of CAT, DS, THR and SLP 



on Impacts was large when the COR was 0.3. There was no obvious problematic level of the 
dimensional strength, item slopes, and range of thresholds. 

Estimating Individual Ability (6, and 

While estimating two individual abilities, results were analyzed for three different 
conditions: 1) when both 0, and 02 were equally important abilities i.e., when both abilities were 
represented by an equal number of items (DS=50/50); 2) when ability 0j was a major ability 
represented by 75 percent of the total number of items (DS=25/75); and 3) when the ability 0, 
was a minor ability represented by 25 percent of the total number of items (DS=25/75). Readers 
are again reminded that PARSCALE did not estimate either the major ability (02) of two 
unequally important abilities or the minor ability (0,) of two unequally important abilities. 
However, for the ease of discussion in this section, the ability estimate obtained from 
PARSCALE was assumed to measure one of two equally important abilities, the major ability 
( 02 ) of two unequally important abilities, and the minor ability (0,) of two unequally important 
abilities. 

Means, standard deviations, and percent of IMPCT_1 for ability 0, and IMPCT_2 for 
ability 02 are presented in Table 4. Means are Absolute Impacts and percentages are Relative 
Impacts. Also presented in the table are RMSE of unidimensional data and the results from 
factor analysis. 



Insert Table 4 about here 



Estimating One of Two Equally Important Abilities : 



There were 32 treatment conditions with equal dimensional strength (50/50). For these 
conditions Absolute Impact varied from 0.072 to 0.510 for ability one and from 0.082 to 0.76 for 
ability two. The standard deviations of these conditions ranged from 0.001 to 0.472 for ability 
one and from 0.001 to 0.452 for ability two. Similarly, the Relative Impact varied from 12 to 224 
percent for ability one and from 12 to 270 percent for ability two. 

Identifying problematic conditions using study variables . While estimating either one 
of the two equally important abilities, 0, or Oj, it was found that the IRT procedure was not 
robust when the correlation was 0.3. The procedure was found most problematic when the 
correlation of 0.3 was combined with the number of categories of 5. Under these conditions, the 
Impact for both abilities ranged approximately from 45 to 225 percent. Impacts were lowered 
when the correlation was increased from 0.3 to 0.8. The combination of the number of categories 
and the correlation had considerable effect on the Impact. Eight out of eight treatment conditions 
were problematic to the violation when the correlation of 0.8 was combined with the number of 
response categories of 5; however, when the same correlation of 0.8 was combined with the 
number of response categories of 3, only four out of eight treatment conditions were problematic 
to the violation. It was an improvement of 50 percent when compared with the number of 
categories of 5. In general, when looking at the bigger picture of estimating either one of the two 
equally important abilities, it was safe to conclude that procedure for all 32 treatment conditions 
was not robust to the violation. Fifty -eight out of 64 Impacts were beyond the threshold of 20 
percent (91 percent correct decision). 

Identifying problematic conditions using proposed practice guidelines . In this study there 
were 32 treatment conditions with two equally important abilities, resulting in a total of 64 



abilities (32 x 2 = 64). When estimating one of two equally important abilities, and when there 
was only one factor obtained from the factor analysis result, five out of six treatment conditions 
were problematic to the violation. 

When two factors were extracted from the factor analysis, the correlation between those 
factors was investigated. It was found that no matter what correlation was obtained (r< 0.74 and 
r > 0.001), treatment conditions were problematic to the violation. For the two-factor result, 24 
out of 26 treatment conditions were problematic to the violation (92 out of 100 times accurate). 
Estimating The Major Ability of Two Unequally Important Abilities : 

The ability Oj was a major ability represented by 75 percent of the total number of items 
when the dimensional strength was 25/75. The Impact for the estimated major ability of two 
unequally important abilities was represented by IMPCT_2 in Table 4. The Absolute Impact 
(meauj) ranged from 0.008 to 0.106; its standard deviation ranged from 0.0001 to 0.042; and the 
Relative Impact (percentj) ranged from 1.35 to 26.15. The range of both Absolute and Relative 
Impacts showed that the Impacts for the estimated major ability were minimal. As a reminder, 
any treatment conditions with Relative Impact smaller than 20 percent was considered not 
problematic to the violation. 

Identifying problematic conditions using study variables . While estimating major ability, 
28 out of 32 treatment conditions were not problematic (Impact < 20 percent) to the violation. 
Thus, when estimating the major ability in two unequally important abilities, the procedure under 
those given treatment conditions was generally robust. These results were correctly identified 88 
out of 100 times. There was an exception when the number of response categories was 5 and the 
correlation was 0.3. Even with this exception, the procedure was robust for 5 out of 8 times. 



Thus, in general, we can conclude that when the dimensional strength was 25/75 and the ability 
being estimated was represented by a larger number of items, the IRT model would be robust to 
the violation regardless of the level of other variables used. 

Identifying problematic conditions using proposed practice guidelines . When one factor 
was extracted by the factor analysis while estimating the major ability of two unequally important 
abilities, the treatment conditions were not problematic to the violation. Eight out of eight 
treatment conditions were identified as not problematic under these conditions. When two 
factors were extracted by the factor analysis, most of the time the procedure was still robust, i.e., 
treatment conditions were not problematic to the violation. Under these situations, 19 out of 24 
treatment conditions (79 percent correct decision) were not problematic to the violation. All five 
treatment conditions that were problematic had a correlation of 0.3, and three out of those five 
conditions had CAT=5. 

Estimating The Minor Ability of Two Unequally Important Abilities : 

The ability 0, was a minor ability represented by 25 percent of the total number of items 
when the dimensional strength was 25/75. There were 32 treatment conditions when the level of 
the dimensional strength was fixed to 25/75. The Impact for the estimated minor ability of two 
unequally important abilities was represented by IMPCT_1 in Table 4. The Absolute Impact 
(mean,) ranged from 0.125 to 0.932, and its standard deviation ranged from 0 to 0.067. The 
Relative Impact (percent,) ranged from 19 to 439 percent. The range of both Impacts showed a 
large variation for the estimated minor ability. 

Identifying problematic conditions using study variables . Generally, the procedure was 
not robust (Impact > 20 percent) to the violation while estimating the minor ability of two 



unequally important abilities. Thirty-one out of 32 treatment conditions were not robust (97 out 
of 100 times the result was correct). The most problematic treatment conditions occurred when 
the correlation was 0.3 and the number of response categories was 5. Although the procedure 
was not robust for all 32 treatment conditions (Impact > 20 percent), the magnitude of violation 
was much smaller when the correlation was 0.8 and the number of response categories was 3. 

Identifying problematic conditions using proposed practice guidelines . When one factor 
was extracted by the factor analysis, the procedure for seven out of eight treatment conditions (88 
percent correct) was not robust. When two factors were extracted, the IRT procedure for all 32 
treatment conditions was not robust. Thus, when minor ability was estimated, the procedure was 
not robust to the violation of the assumption no matter whether one or two factors were extracted 
by the factor analysis. 

Effects of Study Factors for Ability 6, and Ability 6 -,: 

For IMPCT l there were three two-way interactions that were significant and practically 
important: CATxDS, CORxDS, DSxTHR. No interactions with the variables SLP and TL were 
practically important for IMPCT l . For IMPACT 2 three two-way interactions that were 
significant and practically important were CATxDS, CORxDS and DSxSLP. No interactions 
with variables THR and TL were practically important for IMPACT 2. The effects of each of 
the study variables according to the size of their contrast of average Impacts were ranked in Table 
5. For practical purposes it was assumed that any simple main effects contrast of Impacts greater 
than 0. 1 would be of practical importance. 



Insert Table 5 about here 



From the table it can be seen that COR (-0.465) had the largest effect on Impact followed 
by DS (-0.344), THR (-0.149), and CAT(0.145), when estimating the minor ability 0,. The 
variable DS (0.308) had the largest effect on Impact followed by COR (-0.248) when estimating 
the major ability Oj. The two strongest effects were obtained from the variables COR and DS, 
with somewhat weaker effects from the variables THR and CAT when estimating the minor 
ability. Effects were obtained only from the factor DS and COR when estimating the major 
ability. Two factors SLP and TL did not have any practically important effects on Impact when 
estimating both individual abilities, 0, and 02 - 

The effect of DS on Impact was large for ability 0, and ability 02 when the level of DS 
was 25/75 and 50/50, respectively. Thus, depending on the ability estimated, the effect of the 
variable dimensional strength on Impact was different. While estimating ability 0,, the effect of 
COR, DS and THR on the average Impact was always larger when the levels of these variables 
were 0.3, 25/75, -1 to +1, respectively. However, the effect of CAT on the average Impact was 
large only when the number of response category was 5. 

The negative contrasts of COR, DS, and THR indicated that changing the level of each of 
these variables from a lower to a highe r level, i.e., 0.3 to 0.8, 25/75 to 50/50 and -1 to +1 to -2 to 
+2, respectively, decreased the effects of Impacts on the estimated ability 0,. However, for the 
contrast of CAT, changing the number of response categories from 5 to 3 (higher to lower level), 
decreased the effects of Impacts on the estimated ability 0]. 

Siunmary and Conclusion 

The primary purpose of this study was to study the robustness of the IRT generalized 



partial credit (GPC) model to violation of the unidimensionality assumption, and the secondary 
purpose was to provide guidelines to practitioners using the results from the factor analysis. 
Estimating Average Ability : 

From the analysis conducted in the first part, i.e., estimating the average of the two true 
abilities, it was found that regardless of the level of other variables used, the correlation was an 
important factor in the robustness of the IRT procedure when estimating the average ability. 

When the true correlation was 0.8, regardless of the other variables used, the results were 
generally robust to the violation of the assumption. When the true correlation was 0.3, the results 
were robust only when the number of response categories was 3 and the dimensional strength 
was 50/50. Treatment conditions were most problematic to the violation when the correlation 
was 0.3, category was 5, and dimensional strength was 25/75. Thus, from these results it can be 
seen that the correlation was an important variable followed by the number of response 
categories and then the dimensional strength. 

Drasgow and Parsons (1985) concluded that the unidimensional IRT model was robust to 
violation of the unidimensionality assumption when the correlation between common factors was 
0.4 or higher. This was the same correlation value obtained from the factor analysis result (not 
true correlation) that was recommended in this study. As a caveat, the Drasgow and Parsons's 
study was conducted with the dichotomous IRT model and the conditions were different than 
those used in this study. 

Results from the effects of study variables . Results from the ANOVA analysis were 
obtained by analyzing Absolute Impacts, and these results were consistent with the results 
obtained from analyzing the Relative Impacts. The contrasts of the variable category, correlation. 



and dimensional strength were statistically significant and practically important while estimating 
the average ability. 

The Impacts, or the errors, of the variable category were always large when the number of 
response categories was increased from 3 to 5, when the correlation was changed from 0.3 to 0.8, 
and when the dimensional strength was changed from 25/75 to 50/50. Some of the Impacts due 
to the variables of item slopes, range of thresholds, and test length were statistically significant, 
but not always practically important. The contrasts ranked by their magnitude when estimating 
the average ability were as follows: CAT (0.316), which was the biggest contrast, followed by 
COR (-0.263), DS (-0.181), THR (-0.171), and SLP (-0.104). From these ANOVA results it was 
concluded that those variables influenced the IRT procedure of estimation of ability when the 
assumption was violated. 

Results From Estimating Single Ability : 

Results from the analysis of the Relative Impact were summarized by the variable 
dimensional strength (DS). In this way it was possible to separate results according to one of two 
equally important abilities (DS=50/50), major ability of two unequally important abilities (ability 
represented by 75 percent of the total number of items), and minor ability of two unequally 
important abilities (ability represented by 25 percent of the total number of items). 

When estimating one of the two equally important abilities (DS= 50/50), the procedure 
was not robust for correlation of 0.3. The procedure was also not robust when the correlation of 
0.3 was combined with the number of response categories of 5. When the correlation was 
increased to 0.8, the procedure was robust only when the number of response categories was 3; 
however, the procedure was not robust when the number of response categories was 5 and when 



the correlation was 0.8. 



DeAyala (1995) investigated the influence of dimensionality on parameter estimation 
when each dimension was represented by an equal number of items. When the assumption was 
violated, DeAyala found that estimated ability parameters were closer to the averages of the true 
abilities than either of the individual abilities. This result was consistent with the result of this 
study. In this study when each dimension was represented by an equal number of items, 
estimated ability was not closer to either of two equally important abilities. Fifty-eight out of 64 
Impacts were beyond the threshold of 20 percent (9 1 percent correct decision). 

When estimating a major ability of two unequally important abilities (the ability was 
represented by 75 percent of the total number of items) results were robust to the violation, 
regardless of the correlation and the number of response categories used in this study. Results of 
this study were comparable with the results obtained by Way, Ansley, and Forsyth (1988). These 
authors found that, as the correlation between two dimensions decreased (from 0.9 to 0.6 to 0.3), 
estimated ability was strongly correlated with the major (dominant) ability. In the generated data 
the dominant ability was defined by its higher discrimination value. Also, Folk and Green 
(1989) found that, as the correlation between two abilities decreased (the correlations used were 
1.0, 0.8, 0.6, 0.4 and 0.2), estimated ability was closer to either one of the two abilities. 

When estimating a minor ability of two unequally important abilities (ability was 
represented by 25 percent of the total number of items), generally the IRT procedure was not 
robust to the violation. The same arguments used earlier from the study of Way, Ansley, and 
Forsyth (1988) could be used here. Way et. al. (1988) found that, when the correlation between 
two abilities decreased, the estimated ability was strongly correlated with the dominant ability. A 



similar conclusion was obtained by Folk and Green (1989) in their study. 



Results from the effects of study variables . The results from the ANOVA analysis were 
helpful in identifying if the contrast of average impacts were meaningful or not. Results from the 
ANOVA analysis obtained by analyzing Absolute Impacts were consistent with the results 
obtained from analyzing the Relative Impacts. While estimating the major ability, contrasts of 
the variable correlation and dimensional strength were practically important, whereas, while 
estimating the minor ability, the contrasts of the variables category, correlation, dimensional 
strength, and threshold were practically important. The contrasts of the item slope and test length 
were not practically important when estimating both major and minor ability. 

Guidelines for Practitioners : 

The results obtained in this study were derived from the simulated data where true 
parameters were known. However, in practice these true parameters are not known. Thus, the 
same simulated data were analyzed using the Principal Component Analysis (PCA) and its 
results were presented to provide practitioners with some practical guidelines. The guidelines are 
based on the number of factors extracted, correlation between those factors, dimensional 
strength, and number of response categories. 

During the estimation of the average ability, when a single factor solution was obtained 
from the PCA, practitioners could assume the IRT procedure to be robust to the violation. When 
two factors were extracted by the PCA, the correlation between those factors was to be evaluated. 
When the correlation between factors extracted by the PCA was greater than 0.4, the treatment 
conditions were not problematic to the violation. Only when the correlation was smaller than 
0.4, the variables “number of response categories” and “dimensional strength” were critical to the 



robustness of the procedure. When the correlation was smaller than 0.4, but the number of 
response categories was 3 and the dimensional strength was 50/50, practitioners could consider 
those treatment conditions not problematic to the violation. All other treatment conditions 
produced by the combination of the number of response categories of 3 and 5, the dimensional 
strength of 25/75, and correlations of smaller than 0.4 were problematic to the violation. 

Those practitioners who would like to use a polytomous IRT model to analyze their two- 
dimensional data could do so if the correlation between two abilities obtained through the PCA 
was 0.4 or higher. The caveat is that the estimated abilities are the average of those two abilities. 
For example, if practitioners are interested in testing the knowledge of mathematics, and, if the 
test is made up of specific domains within the mathematics area such as algebra and geometry, 
the estimated abilities are the average of the knowledge of algebra and geometry. 

During the estimation of one of the two equally important abilities, when there was only 
one factor obtained from the PCA results, it was recommended to the practitioners that the IRT 
procedure would not be robust to the violation. When there were two factors extracted by the 
PCA, no matter what the correlation was between those factors r > 0.001 and r< 0.74), all 
treatment conditions were problematic to the violation. 

During the estimation of the major ability of two unequally important abilities, when 
there was only one factor extracted by the PCA, it was recommended that under those conditions 
the treatment conditions were not problematic to the violation, i.e., the IRT procedure was robust 
to the violation. When there were two factors extracted by the PCA, generally the procedure was 
still robust. Thus, when a major ability was being estimated, practitioners could use the IRT 
procedure because it was robust to the violation of the assumption under those given treatment 



conditions. 



During the estimation of the minor ability of two unequally important abilities, when one 
factor was extracted by the PC A, it was recommended to practitioners that the procedure was not 
robust under any treatment conditions. When two factors were extracted, the IRT procedure for 
all treatment conditions was also not robust. Thus, when a minor ability was being estimated, no 
matter whether one or two factors were extracted by the PCA, practitioners should not use the 
IRT procedure because it was not robust to the violation of the assumption under those given 
treatment conditions. 

This was a computer simulation study and results obtained should not be generalized 
beyond the scope of parameters simulated in this study. All recommendations provided in this 
study were also limited to the scope of parameters of this study. This study included only two 
distinct dimensions when generating data to violate unidimensionality. Also, all variables 
(parameters) that were varied in this study were fixed to two levels. Variables (parameters) such 
as dimensional strength, correlation, and number of response categories were sensitive to the 
violation. Thus, conducting studies by varying the level of those sensitive variables may provide 
some definite answers to the question, under which conditions the IRT model would be robust. 
The results of this study showed that the procedure was robust to the violation for the correlation 
of 0.3 when the number of response categories was 3 and the dimensional strength was 50/50. 
However, further investigation could be conducted to determine if either the number of response 
categories or the dimensional strength was critical for results to be robust when the correlation 
was small. 



26 




27 



Table 1 



Variables and Their Levels. 



Description of Variables 


Variable Label 


Levels of Variable 


Category 


CAT 


3 


5 


Correlation 


COR 


0.3 


0.8 


Dimensional Strength 


DS 


25/75 


50/50 


Slope 


SLP 


0.5 


1.0 


Threshold 


THR 


-1 to +1 


-2 to +2 


Test Length 


TL 


8 


16 



CAT : number of response categories for each item in the test 

COR: correlation between two abilities in two-dimensional data; for unidimensional data 
COR=1.0 

DS: dimensional strength (or relative importance) determined by the number of items in each 

dimension in the test. For example, 25/75 represents 25 percent of the total number of 
items in a test is represented by one dimension, and the remaining 75 percent of the total 
number of items in the test is represented by the other dimension. 

SLP: slope of items used in the test (a concept similar to discrimination in dichotomous IRT 

models). 

THR: the range of values of item thresholds (the lowest and the highest). The number of 
thresholds depends on the number of item categories. 

TL; number of items used (8 and 1 6 items) in the test. 



27 




28 



Table 2 

Mean, Standard Deviation and Percent of IMCT AV and Factor Analysis Results 



o 

CO 






Dd 



Id 

c 

< 



u 

cd 

{JU 



<N 



m 

in 



(N 

00 

o 

fS 



OS 

VO 

00 

VO 



00 

(N 

O 

00 



in 

in 

m 

<N 



(N 

Os 

O 



OS 

m 

VO 

O 

(N 



00 

OS 



OS 

r- 



os 

VO 

m 



in 

in 

os 



O 

00 

o 



r- 

00 

(N 

in 

(N 



m 

in 

(S 

in 



VO 

o 

(N 



O 

o 

o 

o 

o 



o 

(N 

CN 

VO 



uo 

vd 



VO 

Ov 

rn 

tT 



00 






(N 



(N 

(N 

VO 



(N 

OS 



00 

(N 



On 



00 

(N 



00 



OS 



VO 

OS 



tT 

(N 



tT 

(N 



VO 

(N 



m 

(N 



VO 

(N 

vd 



VO 

00 



VO 

VO 



> 

? 

U 

cu 

2 



> 

a> 

TD 

TD 



VO 

O 



O 

o 



fS 

o 



o 

o 



o 

o 



<o 

o 

o 



o 

o 



(N 

o 



VO 

o 



VO 

O 

o 



(N 

o 



00 

o 



(N 

o 



(N 

O 

o 

o 



m 

o 



(N 

O 

o 



VO 

o 



m 

o 



OS 

OS 

o 



<o 

Os 



CN 

o 



VO 

(N 

(N 



m 

(N 



m 

O 



m 

(N 



O 

VO 

o 

o 



OS 

o 

o 



m 

ro 

o 

o 



(N 

VO 

O 

O 



00 

o 

d 



VO 

o 



o 

m 

o 



VO 

VO 

O 



VO 

o 

o 



o 

(N 

o 



Q 

uJ 

CO 



Os 

VO 



Os 

ro 

Tt 



VO 

<o 

VO 

VO 



VO 



VO 

d 



»o 

Os 

ro 



»o 

tT 

OS 

<N 



VO 

tT 

Os 

VO 

d 



fO 

VO 

ro 



VO 

(N 

VO 



fO 

o 

5 



VO 

VO 

VO 



VO 

d 



VO 

(N 

Os 

ro 



os 

(N 

d 



fO 

VO 



VO 

(N 

fO 

d 



os 

VO 



OS 

fO 



VO 

fO 

VO 

VO 



VO 



VO 



CO 

> 

Dd 

W 

X 

H 

Q 

< 

CO 

UQ 

QQ 

< 

s 

< 

> 



(N 

+ 

O 



(N 

+ 

O 



(N 

+ 

O 



(N 

+ 

O 



(N 

+ 

O 



(N 

+ 

O 

4—* 

fS 



(N 

+ 

O 

4—* 

<N 



(N 

+ 

O 

<N 



»o 

d 



VO 

d 



»o 

d 



VO 

d 



»o 

d 



»o 

d 



»o 

d 



»o 

d 



VO 

d 



VO 

d 



•o 

»o 

(N 



•O 

VO 

(N 



»o 



VO 

(N 



VO 



VO 

(N 



VO 



VO 

(N 



VO 



VO 

(N 



VO 



VO 

(N 



»o 

c: 

VO 

«N 



O 
VO 
' — , 

o 

VO 



o 

»o 

o 

»o 



o 

»o 

o 

»o 



o 

»o 

o 

»o 



o 

VO 

o 

»o 



o 

VO 

o 

>o 



o 

VO 

o 

VO 



o 

VO 

o 

VO 



VO 

VO 

(N 



VO 

c: 

VO 

(N 



fo 

d 



m 

d 



m 

d 



fO 

d 



fO 

d 



fO 

d 



fO 

d 



fO 

d 



fO 

d 



fO 

d 



fO 

d 



fO 

d 



fO 

d 



fO 

d 



fO 

d 



ro 

d 



00 

d 



00 

d 







Table 2— Continued 



Cv> 

cn 



Factor Analysis Results 


corr 


zzozv 


o 

o 

o 

o 

o 


.63519 


o 

o 

o 

o 

p 


00 

o 

00 

Tt 


.39910 


-.50917 


o 

o 

o 

o 

p 


-.44047 


O 

O 

o 

o 

p 


.68043 


.40260 


-.54988 


.20519 


.21912 


.19816 


.20738 


.24249 


.27002 


factor 


(N 


- 


<N 


- 


(N 


(N 


(N 


- 


(N 


- 


(N 


(N 


(N 


(N 


(N 


(N 


(N 


(N 


(N 


IMPCT_AV 


percent,. 


1.36 


12.41 


e'9i 


-2.52 


4.35 


LVO 


*'7 

(N 

1 




-0.4 


00 


2.38 


d 

1 


1.39 


64.62 


^’66 


36.42 


64.88 


116.7 


173 


std dev 


6000 


0.027 


0.004 


0.02 


eooo 


00 

o 

p 

d 


300*0 


0.059 


0.026 


p 

d 


900*0 


eooo 


O 

d 


0.004 


1100 


1000 


o 

o 

d 


0.002 


o 

o 

d 


> 

10 

§ 

4> 

B 


O 

o 

o 


0.049 


00 

o 

o 


O 

o 

1 


6100 


o 

o 

d 


o 

d 

1 


0100 


(N 

O 

O 

d 

1 


o 

o 

d 


o 

o 

d 


-0.004 


900*0 


0.257 


0.297 


091*0 


0.218 


0.331 


0.367 


o 

w 

c/) 


0.5146 


0.395 


0.2945 


0.5946 


0.4363 


0.5726 


0.4403 


Z.9990 


d 


0.3925 


0.2941 


0.5737 


0.4325 


0.3977 


0.2988 


0.4393 


0.336 


0.2836 


33130 


VARIABLES AND THEIR LEVELS 




VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


u 

■s 


-2to+2 


-Ito+l 


-Ito+l 


-2to+2 


-2to+2 


-Ito+l 


-Ito+l 


-2to+2 


-2to+2 


+ 

0 

1 


-Ito+l 


-2to+2 


-2to+2 


-lto+1 


-Ito+l 


-2to+2 


-2to+2 


-lto+1 


-lto+1 


7/5 


o 


p 


p 


p 


p 


d 


*0 

d 


d 


d 


p 


p 


p 


p 


VO 

d 


<o 

d 


<o 

d 


<o 

d 


p 


p 


c/) 

T3 


25/75 


25/75 


25/75 


25/75 


25/75 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


cor 


OO 

o 


80 


OO 

o 


OO 

o 


00 

o 


00 

d 


00 

d 


00 

d 


00 

d 


00 

d 


00 

d 


00 

d 


00 

d 


fO 

d 


ro 

d 


ro 

d 


ro 

d 


fO 

d 


fO 

d 


cd 

o 
















ro 


ro 




ro 


ro 




VO 


VO 


«o 


«o 


•o 


«o 




Table 2— continued 



CO 




CO 

CO 




Table 2— continued 



CO 

3 

CO 

0> 

Qd 

to 

’to 


corr 


.67078 


o 

o 

o 

o 

o 


-.61329 


o 

o 

o 

o 

o 


.74360 


o 

o 

o 

o 

o 


.69518 




















C 


















< 


u 
















u 


O 
















O 


P 


(N 




(N 




(N 




(N 


o 

Cd 


















Uh 




















> 




































C 


00 


00 


WO 












4> 


ON 


rn 


cn 


1— ' 


. 


(N 


wo 




E 


CN 


X- 


(N 


<N 


WO 




(N 




Cl 
















> 


















< 


dJ 




tT 


(N 


(N 


VO 




00 




no 


O 


(N 


(N 




o 


o 


O 


H 




o 


o 


O 


o 


o 


o 


o 


U 




o 


o 


o 


o 


d 


d 


d 


C2U 


















s 








































Os 


o 


oo 


VO 






VO 




s 


o 


(N 


o 


o 




o 


o 




0> 


o 


o 


o 


o 


o 


o 


o 




E 


o 


o 


o 


o 


d 


d 


d 




Q 


(N 






(N 


wo 


00 


tT 


w 


rvi 


VO 








o 


Ov 


C/3 


o 


WO 




00 


o 


r*o 


fO 








X* 


o 


fS 


(N 


ro 


fS 






O 


o 




d 


d 


d 


d 






VO 


00 


V£> 


00 


VO 


00 


VO 


C/3 

j 






<N 


<N 










Til 


Lm 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


UU 


JZ 


O 


o 


o 


o 


o 


o 


o 
















4.^ 


4.^ 


tu 






<N 


<N 






<N 


<N 


nJ 








































































UJ 


















3d 




uo 


wo 


wo 


o 


o 


O 


O 


H 


lo 


o 


o 


o 










Q 




































< 


















C/3 




o 


o 


o 


o 


o 


o 


o 


UJ 


CO 


uo 


wo 


wo 


wo 


wo 


wo 


wo 




X> 


o 


o 


s 


d 


d 


d 


d 


QQ 




wo 


wo 


wo 


wo 


wo 


wo 


wo 


< 


















s 


















<>> 

> 


u 

Q 


oo 


oo 


oo 


oo 


oo 


oo 


oo 




CJ 


o 


o 


o 


d 


d 


d 


d 




4.^ 


















Cd 


wo 


wo 


wo 


wo 


wo 


wo 


wo 




CJ 

















w c 



S. M 



U D 



c o 



lit e W 



ja s *2 

CO 4 — • Cl lO 



iS 

cd 

- .52 

Cd 0 

c ^ 
O Cd 

s § 

0> U 



> <u 

1 -S 

Cd ^ 

C -O 

’5b 12 
o § 
^ <S 

no 

E B 
O 9 

^ I 

XJ X 
4^ 0> 

4-^ 

CJ u 

S o 
X -s 
i:; 

JO O 

o c 

o .2 

<S ^ 

2 t 

2 ® 

c u 



2 t 

^ O 
tS u 



CO 

CO 



LO 

CO 




Table 3 



Results of the ANOVA analysis of Absolute Impact while estimating Average Ability 



Effects of 


Biggest 

Contrast 


Conditions (Variables and their Levels) 


CAT 


COR 


DS 


SLP 


THR 


CAT 


0.316 


- 


0.3 


25/75 


0.5 


-1 to +1 


COR 


-0.263 


5 


- 


25/75 


1.0 


-1 to +1 


COR 


-0.244 


5 


- 


50/50 


1.0 


-1 to +1 


COR 


-0.222 


5 


- 


25/75 


0.5 


-1 to +1 


COR 


-0.208 


5 


- 


25/75 


1.0 


-2 to +2 


COR 


-0.202 


3 


- 


25/75 


1.0 


-1 to+1 


DS 


-0.182 


5 


0.3 


- 


0.5 


-1 to +1 


DS 


-0.181 


5 


0.3 


- 


1.0 


-2 to +2 


THR 


-0.171 


5 


0.3 


50/50 


1.0 


- 


SLP 


0.157 


5 


0.3 


50/50 


- 


-1 to+l 



32 




37 



Table 4 

Mean, Standard Deviation and Percent of IMPCT 1 and IMPCT 2 and Factor Analysis Results 



Factor Analysis*^ 


fc 

o 

U 


.12515 


.17537 


.12082 


.16869 


00 

(N 

O 

00 


.22355 


.09725 


.20639 


o 

o 

o 

o 

p 


.16220 


o 

o 

o 

o 

p 


.42022 


o 

o 

o 

o 

p 


.63519 


o 

o 

o 

o 

p 


.48408 


.20519 


.21912 


factor 


(N 


(N 


(N 


(N 


(N 


(N 


(N 


(N 


- 


(N 


- 


(N 


- 


(N 


- 


(N 


(N 


(N 


1MPCT_2 


4^ 

c 

a 

CL 


9Z9 


WO 

00 


11.45 


20.60 


12.91 


15.96 


9.75 


22.00 


1.35 


5.24 


933 


1.75 


10.38 


8.15 


-1.34 


4.13 


14.33 


11.04 


std dev 


0.042 


0.002 


0.038 


0.025 


0.014 


o 

o 

d 


0.022 


o 

d 


eeoo 


0.035 


0.024 


100 


0.021 


W^ 

O 

o 

d 


w^ 

o 

o 

d 


W^ 

O 

o 

d 


W^ 

O 

d 


o 

o 

o 

d 


§ 

o> 

s 


0.037 


0.052 


9L00 


0.106 


WO 

o 

d 


0.047 


00 

WO 

o 

d 


9600 


00 

o 

o 

d 


0.023 


W^ 

O 

d 


6000 


0.041 


0.024 


00 

o 

o 

d 

1 


00 

o 

d 


0.057 


eeoo 


IMPCT_1 


c 

0> 

a 

0> 

cx 


89.68 


155.35 


67.51 


103.77 


185.82 


V9LZ 


81.74 


133.85 


25.55 


45.56 


18.84 


33.04 


9Z.09 


95.76 


22.20 


46.30 


192.36 


281.79 


std dev 


0.063 


0.021 


00 

o 

o 

d 


eeoo 


O 

d 


0.002 


o 

d 


0.023 


0.067 


eeoo 


0.028 


o 

o 

d 


0.029 


0.004 


0.029 


9000 


0.004 


0.012 


§ 

0> 

S 


o 

wo 

o 


0.682 


00 

d 


0.534 


0.734 


00 

d 


0.486 


0.584 


0.151 


0.200 


0.125 


o 

d 


0.240 


0.282 


0.132 


0.202 


0.765 


0.842 


Q 

uT 

c/3 


0.591 


0.439 


0.6636 


0.5146 


0.395 


0.2945 


0.5946 


0.4363 


0.591 


0.439 


0.6636 


0.5146 


0.395 


0.2945 


0.5946 


0.4363 


0.3977 


0.2988 


VARIABLES AND THEIR LEVELS 


T3 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


SO 


00 


VO 


00 


VO 


00 


VO 


€ 


0 

1 


0 

1 


-2 to 2 


-2 to 2 


-1 to 1 


0 

1 


-2 to 2 


-2 to 2 


0 

1 


0 

1 


-2 to 2 


-2 to 2 


3 

1 


0 

1 


-2 to 2 


-2 to 2 


-1 to 1 


-1 to 1 




wo 

o 


wo 

o 


WO 

d 


WO 

d 


o 


o 


p 


p 


w^ 

d 


w^ 

d 


W^ 

d 


W^ 

d 


p 


p 


p 


p 


w^ 

d 


w^ 

d 


(/) 

•o 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


cor 


o 


d 


d 


d 


d 


d 


d 


d 


00 

d 


00 

d 


00 

d 


00 

d 


00 

d 


00 

d 


00 

d 


00 

d 


d 


d 


ca 

o 


































w^ 


w^ 



OD 

CO 



oo 

CO 




Table 4— continued 



Factor Analysis^ 


Con* 


91861* 


.20738 


.24249 


.27002 


.20871 


.25707 


O 

o 

o 

o 

p 


00 

00 

00 

o 

p 


o 

o 

o 

o 

p 


.56406 


o 

o 

o 

o 

p 


.72349 


o 

o 

o 

o 

p 


.68542 


.15981 


16LLI' 


.11369 


.19552 


.2108 


factor 


<N 


<N 


<N 


<N 


<N 


<N 


- 


<N 


- 


<N 


- 


<N 


- 


<N 


<N 


<N 


<N 


<N 


<N 


IMPCT_2 


c 

0> 

E 

0> 

o. 


21.17 


<N 


ON 

ON 


ON 

vd 


<N 

wS 


26.15 


VO 

vn 

ON 


oo 

fn 


m 

m 

Os 


(N 

O 


m 

CN 

O 


1^ 

p 

rn 


Os 


<N 

v/S 


p 

vri 


71.54 


36.75 


O 

O 

S 


89.17 


std dev 


0.002 


6100 


o 

o 

o 

o 


o 

o 

o 

o 


9000 


0.029 


0.029 


o 

o 

o 

o 


00 

<N 

p 

O 


o 

o 


00 

o 

o 


0.007 


0.002 


00 

p 

o 


6100 


00 

o 

o 

o 


0.017 


o 

o 

p 

o 


o 

o 

d 


u 

0> 

B 


o 

O 

O 


0.072 


o 

o 


0.035 


p 

O 


p 

o 


0.038 


p 

o 


o 

o 


m 

o 

o 


0.029 


0.029 


0.037 


0.037 


0.315 


rn 

O 


VO 

Tj- 

<N 

O 


<N 

fO 

o 


VO 

fO 

d 


IMPCT_l 


percent, 


142.50 


218.75 


oroie 


439.21 


233.14 


339.60 


61.12 


97.39 


LV6P 


72.92 


112.13 


170.59 


<N 

p 

00 

1^ 


699Z1 


47.50 


VO 

00 

VO 

00 


o 

00 

m 


09Z9 


98.09 


std dev 


o 

o 


o 

o 

o 

o 


O 

O 

o 


o 

o 

o 


00 

o 

o 

o 


o 

o 

o 


ro 

O 

o 


o 

o 


p 

c> 


o 

o 

o 

o 


o 

o 


O 

o 


o 

o 

p 

o 


m 

p 

O 


o 

<6 


VO 

O 

O 

o 


0.042 


0.039 


00 

p 

d 


§ 

0> 

B 


9390 


0.735 


0880 


0.932 


r- 

o 


00 

o 


tT 

<N 

O 


Os 

<N 

o 


9iro 


0.245 


0.318 


0.362 


VO 

<N 

O 


0.310 


0.272 


0.378 


0.254 


eieo 


0.385 


o 

uT 

O) 


0.4393 


0.336 


0.2836 


ZUZO 


0.3307 


0.2447 


0.3976 


0.2988 


0.4393 


0.336 


0.2836 


0.2122 


0.3307 


0.2447 


0.5726 


0.4403 


L9990 


p 

d 


0.3925 


VARIABLES AND THEIR LEVELS 


- 


00 




00 


o 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


u 


-2 to 2 


-2 to 2 


0 

1 


0 

1 


-2 to 2 


-2 to 2 


0 

1 


0 

1 


-2 to 2 


-2 to 2 


0 

1 


0 

1 


-2 to 2 


-2 to 2 


0 

1 


0 

1 


-2 to 2 


-2 to 2 


0 

1 


-9- 

"e/5 


o 


o 


o 


o 


p 


p 


o 


p 

o 


o 


o 


p 


p 


p 


p 


p 

o 


p 

o 


p 

o 


p 

d 


p 


c/) 

■o 


25/75 


25/75 


25/75 


<N 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


25/75 


50/50 


50/50 


50/50 


50/50 


50/50 


u 

O 

o 


rn 

O 


m 

O 


rn 

O 


rn 

O 


rn 

O 


rn 

O 


00 

o 


00 

o 


00 

o 


00 

o 


00 

o 


00 

o 


00 

o 


00 

o 


O 


p 

o 


p 

o 


p 

d 


to 

d 


td 

o 
































ro 


ro 


ro 


to 




Table 4— continued 



07 



Factor Analysis^ 


Corr 


.25287 


.15253 


.20765 


.39910 


-.50917 


o 

o 

o 

o 

p 


-.44047 


o 

o 

o 

o 

p 


.68043 


0930t^' 


-.54988 


.23212 


.23570 


0618T 


.23673 


CO 

o 

p 


.00062 


.00162 


.26144 


factor 


(N 


(N 


(N 


(N 


(N 


- 


(N 


- 


(N 


(N 


(N 


(N 


(N 


(N 


(N 




(N 


(N 


(N 


IMPCT_2 


c 

E 

Cl 


132.27 


00 

VT) 


80.92 


orei 


21.35 


12.30 


20.00 


1^ 

00 

<N 

ro 


49.30 


14.47 


25.43 


46.35 


142.62 


70.77 


00 


270.27 


228.43 


113.36 


172.10 


std dev 


100*0 


0.046 


9000 


1000 


o 

p 

d 


9900 


0.014 


110*0 


VO 

o 

o 

d 


o 

d 


1100 


0.037 


0.128 


9100 


0.012 


CO 

p 

d 


0.452 


0.020 


0.013 


§ 

0> 

B 


0.389 


0.263 


0.350 


0.075 


0.094 


0.082 


001*0 


0.129 


0.145 


ro 

00 

p 

d 


o 

d 


0.192 


0.431 


0.323 


0.378 


0.760 


0.474 


0.375 


0.412 


IMPCT_1 


percent I 


141.79 


00 

od 


76.53 


15.54 


20.89 


00 


o 

00 

VO 


26.75 


47.94 


12.55 


24.51 


123.13 


124.42 


72.74 


112.65 


47.65 


224.58 


ZtlZl 


68061 


1 

std dev 


o 

p 

o 


O 

p 

d 


o 

p 

d 


910*0 


100*0 


00 

p 

d 


0.038 


610*0 


eooo 


0.014 


900*0 


990*0 


0.128 


p 

d 


0.032 


110*0 


0.472 


'Tt 

o 

d 


0.021 


mean. 


0.417 


o 

00 

(N 

d 


0.331 


6800 


0.092 


0.079 


0.079 


VO 

o 

d 


0.141 


2Z.00 


0.106 


o 

vrj 

d 


0.376 


0.332 


CO 

00 

CO 

d 


0.134 


0.466 


o 

d 


0.457 


1 

1 


Q 

uT 

C/D 


0.2941 


0.5737 


0.4325 


0.5726 


0.4403 


0.6667 


VO 

d 


0.3925 


0.2941 


0.5737 


0.4325 


0.4142 


0.3022 


0.4564 


CO 

d 


0.2812 


0.2075 


0.3308 


0.2394 


VARIABLES AND THEIR LEVELS 




VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


00 


VO 


■s 


0 

1 


-2 to 2 


-2 to 2 


0 

1 


0 

1 


-2 to 2 


-2 to 2 


0 

1 


0 

1 


-2 to 2 


-2 to 2 


0 

1 


0 

•t-t 

1 


-2 to 2 


-2 to 2 


0 

1 


0 

1 


-2 to 2 


-2 to 2 


*c/5 


p 


p 


p 


d 


d 


d 


VO 

d 


p 


p 


p 


p 


vrj 

d 


vrj 

d 


VO 

d 


VO 

d 


p 


p 


p 


p 


c/5 

X> 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


50/50 


u- 

O 

u 


O 


rn 

d 


d 


00 

d 


00 

d 


00 

d 


00 

d 


00 

d 


00 

d 


00 

d 


00 

d 


d 


d 


CO 

d 


CO 

d 


CO 

d 


CO 

d 


CO 

d 


CO 

d 


cd 

u 














fO 


fO 


fO 


fO 




vr» 




VO 


»o 


VO 


VO 


VO 


VO 




Table 4— continued 



(/) 

CO 

cd 

c 

< 


Corr 


o 

o 

o 

o 

p 


.67078 


o 

o 

o 

o 

p 


-.61329 


o 

o 

o 

o 

p 


.74360 


o 

o 

o 

o 

p 


.69518 






















o 




















p 


o 


















cd 

Um 


ts 




(N 




(N 




(N 




(N 
























c 




ON 


o 


(N 




1^ 


o 


ON 








p 




OO 


(N 


p 


p 


tn 






tri 


»ri 




od 






ON 


d 




V 


(N 




(N 


ro 


tr> 


00 


m 


1^ 




CL 


















*^l 




















H 


> 


00 


NO 


ON 


NO 




NO 


00 


Tt 


(J 


to 


o 


o 


fO 




<N 


o 


o 




cu 




o 


o 


o 


O 


O 


o 


o 


p 


s 


4— • 


o 


d 


d 


d 


d 


d 


d 


d 
























<N 

c 




ON 


o 


<N 






o 


ON 






o 


m 




fO 


Tt 




fo 


VO 




0> 




»-*< 
















E 


o 


d 


d 


d 


d 


d 


d 


d 




c 




(N 






00 










0> 


(N 


rn 




p 


(N 




p 


p 






<N 




NO 




(N 


NO 


od 


rn 




0> 


(N 




(N 


m 


»r> 


00 


ro 


NO 




CL 


















1 




















H 


> 

<u 






Tf 


00 


Tf 


NO 




(N 


CJ 


*o 


O 


o 


O 


<N 


o 


o 




O 


Oh 




p 


o 


O 


O 


o 


o 


O 


o 


s 


w 

V) 


d 


d 


d 


d 


d 


d 


d 


d 
























c 


fN 


ro 




ON 




o 


ON 


m 






On 


Tj- 


(N 


(N 


Tj- 


OO 


(N 


NO 




0> 


O 


















E 


d 


d 


d 


d 


d 


d 


d 


d 




o 


(N 


(N 






(N 




00 


Tj- 


UJ 




(N 


VO 


Tt 






o 


On 


CO 




O 




fo 


00 


o 




m 








m 




o 


(N 


(N 




CSj 






d 


d 


d 




d 


d 


d 


d 


CO 




00 


NO 


00 


NO 


00 


NO 


00 


NO 




















nJ 




















ua 








(N 


(N 


— 




(N 


(N 


> 




o 


o 


o 


o 


o 


o 


O 


O 


ua 










4— > 






4^ 




nJ 




















od 








































ua 




















K 












o 


o 


o 


o 


H 


7/5 


d 


d 


d 


d 










Q 




















2: 




















< 




o 


o 


o 


o 


o 


o 


o 


o 


CO 


-52 


m 




tn 


tn 




tri 




NO 


ua 


TJ 


d 


d 


d 


d 


d 


d 


d 


d 




















NO 


CQ 




















< 




















S 


u- 

Q 


00 


00 


00 


00 


00 


00 


00 


00 


< 


o 


d 


d 


d 


d 


d 


d 


d 


d 


> 










































cd 
















NO 




p 






















fgtt 






2 ,P 



« 52 — 



! <2 o 



S /?< 

^ — s 



(N 



O SJ 

’5 ^ 



cd O jQ 

O O TJ 



> 

*o 



£•— S o 

C/J U5 O % 



IT. 







Table 5 



Results of the ANOVA analysis of Absolute Impact while estimating Ability O, and 6 :>. 



Effects 

of 


Biggest Contrast When Estimating 


Ability 0, 


Condition 


Ability 02 


Condition 


COR 


-0.465 


DS=25/75 


- 


- 


DS 


-0.344 


COR=0.3 


0.308 


COR=0.3 


DS 


-0.280 


CAT=5 


- 


- 


COR 


-0.239 


DS=50/50 


-0.248 


DS=25/75 


DS 


-0.183 


CAT=3 


- 


- 


THR 


-0.149 


DS=25/75 




- 


CAT 


0.145 


DS=25/75 


- 


- 



37 




46 



References 



Andrich, David. (1978a). A Rating Formulation for Ordered Response Categories. 
Psvchometrika . 43 . pp. 561-573. 

Andrich, David. (1978b). Application of a Psychometric Rating Model to Ordered 

Categories Which are Scored with Successive Integers. Applied Psychological 
Measurement . 2 4 pp.58 1-594. 

Ackerman, T.A. (1989). Unidimensional IRT Calibration of Compensatory and 

Noncompensatory Multidimensional Items. Applied Psychological Measurement . 13 2, 
pp. 113-127. 

Ansley, T.N. and Forsyth, R.A. (1985). An Examination of the Characteristics of 

Unidimensional IRT Parameter Estimates Derived From Two-Dimensional Data. Applied 
Psychological Measurement . 9. 1 . pp. 37-48. 

California's New Academic Assessment System. (1996, January). National Council on 
Measurement in Education Quarterly Newsletter. 3. 1. 

Crocker, L. (1995, Winter). Editorial. Educational Measurement: Issues and Practice. 14. 

4. 

Dawadi, B. R. (1998). Robustness of the Polvtomous IRT Model to the Violations of the 

Unidimensionalitv Assumption . Dissertation. Florida State University, Tallahassee, FL. 

DeAyala, R.J. (1995a). The Influence of Dimensionality on Estimation in the Partial Credit 
Model. Educational and Psychological Measurement . 55 3. pp.407-222. 

DeAyala, R.J. (1995b). Item Parameter Recovery for the Nominal Response Model . 

Paper presented at the annual meeting of the American Educational Research 
Association, San Francisco, CA April 18-22, 1995. 

Dirir, M.A. & Sinclair, N. (April, 1996). On reporting IRT ability scores when the test is 

not unidimensional . A paper presented at the annual meeting of the NCME, New York. 

Dodd, B. G., Koch, W. R., and DeAyala, R. J. (1993). Computerized Adaptive Testing 

Using the Rasch Partial Credit Model: Effects of Item Pool Characteristics and Different 
Stopping Rules. Educational and Psychological measurement . 53 . pp. 61-77. 



38 




47 



Dorans, N. J., & Kingston, N. M. (1985). The Effect of Violations of Unidimensionality 

on the Estimation of Item and Ability Parameters and on Item Response Theory Equating 
of the GRE Verbal Scale. Journal of Educational Measurement. 22 (4), 249-262. 

Downing, S. M. and Haladyna, T. M. (1996). A Model for Evaluating High-Stakes 

Testing Programs: Why the Fox Should Not Guard the Chicken Coop. Educational 
Measurement: Issues and Practice. 1 5 (1), 5-12. 

Drasgow, F. and Parsons, C.K. (1983). Application of Unidimensional Item Response 

Theory Models to Multidimensional Data. Applied Psychological Measurement . 7. 2. 
pp. 189-199. 

Folk, V.G. and Green, B.F. (1989). Adaptive Estimation When the Unidimensionality 

Assumption of IRT is Violated. Applied Psychological Measurement . 13 (4), 373-389. 

Harris, J, Laan, S. and Mossenson, L. (1988). Applying Partial Credit Analysis to the 

Construction of Narrative Writing Test. Applied Measurement in Education , i. 4. pp. 
335-346. 

Hambleton, R.K. (1989). Principles and Selected Applications of Item Response 

Theory. In R. Linn tEd.!. Educational Measurement . (3rd ed. pp. 147-200). New 
York:Wiley 

Harrison, D.A. (1986). Robustness of IRT Parameter Estimation to Violations of the 

Unidimensionality Assumption. Journal of Educational Statistics . 11 . 2. pp. 91-115. 

Hambleton, R. and Swaminathan, H. (1985). Item Response Theory: Principles and 
Applications . Boston, MA: Kluwer.Nijhoff Publishing. 

Kirisci, L. and Hsu, T.C. (1995). The Robustness of BILOG to Violations of 

Assumptions of Unidimensionalitv of Test Items and Normality of Ability . Paper 
presented at the annual meeting of the NCME, San Francisco, April, 1995. 

Luecht, R. M. and Miller, T. R. (1992a). Unidimensional Calibration and 

Interpretation of Composite Traits for Multidimensional Tests. Applied Psychological 
Measurement . 16 . 3, pp. 279-293. 

Luecht R. M. and Miller, T. R. (1992b). Consideration of Multidimensionalitv in 

Polvtomous Item Response Models . Paper presented at the annual meeting of the AERA, 
San Francisco, CA. April. 

Masters, G. N. (1982). A Rasch Model for Partial Credit Scoring. Psychometrika . 47. 2. 
pp. 149-174. 



39 



O 

ERIC 



48 



Muraki, E. (1993). Information Functions of the Generalized Partial Credit Model. 

Applied Psychological Measurement . 17.(4). 351-363. 

Muraki, E. & Carlson, J. E. (1993). Full-information Factor Analysis for Polvtomous 

Item Responses . Paper presented at the annual meeting of the AERA (Atlanta, GA, 
April). 

Muraki, E. & Carlson, J. E. (1995). Full-information Factor Analysis for Polytomous 
Item Responses. Applied Psychological Measurement . 19 . 1 . pp. 73-90 

Muraki, E.. (1992a). A Generalized Partial Credit Model: Application of an EM 
Algorithm. Applied Psychological Measurement . 16 2. pp. 159-176. 

Muraki, E. & Ankenmann, R. D. 09931. Applying the Generalized Partial Credit Model 
to Missing Responses: Implementing the Scoring Function and a Lower Asymptote 
Parameter . A paper presented at the annual meeting of the AERA, Atlanta, GA. 

Muraki, E. & Bock, R.D. (1993). PARSCALE : IRT based Test Scoring and Item 

Analysis for Graded Open-ended Exercises and Performance Tasks. Scientific Software 
International, Chicago:IL 

Muraki, E. (1990). Fitting a Polytomous Item Response Model to Likert-Type Data. 

Applied Psychological Measurement . 14 . 1. pp. 59-71. 

Muraki, E. (1996). RES GEN: Item Response Generator . Version 2.0, Educational Testing 
Service, Princeton, New Jersey. 

Norusis, M. J./SPSS Inc. (1993). SPSS Manual . SPSS, Inc. Chicago, IL 

Oosterhof, A. C. (1990). Classroom Applications of Educational Measurement. 

Columbus, OH: Merrill Publishing Company. 

Oshima, T.C. and Miller, M.D. (1990). Multidimensionality and IRT-Based Item 

Invariance Indexes: The Effect of Between-Group Variation in Trait Correlation. Journal 
of Educational Measurement . 27 . 3. pp. 273-283. 

Oshima, T.C. and Miller, M.D. (1992). Multidimensionality and Item Bias in Item 
Response Theory. Applied Psychological Measurement . 16 . 3. pp. 237-248. 

Reckase, M.D. (1979). Unifactor Latent Trait Models Applied to Multifactor Tests: 

Results and Implications. Journal of Educational Statistics . _4, 3, pp. 207-230. 

Reckase, M.D. (1985). The Difficulty of Test Items That Measure More than One Ability. 



40 




49 



Applied Psychological Measurement . _9. 4. pp.401-412. 



Reckase, M.D., Ackerman, T.A. & Carlson, J.E. (1988). Building Unidimensional Test 

Using Multidimensional Items. Journal of Educational Measurement . 25 . 3. pp. 193-203. 

Sykes, R.C., Yen, W. & Ito, K. (1996). Scaling Polvtomous Items That Have Been 

Scored bv Two Raters . Paper presented at the annual meeting of the NCME (New York, 
NY April). 

Tate, R. L. (1992). Maintaining Scale Consistency for Florida Writing Assessment 

Programs . Student Assessment Services, Bureau of Education Information and 
Assessment Services, Department of Education, Tallahassee, Florida.(Unpublished) 

Tate. R. L. (19931. Polvtomous IRT Scaling of Florida Writing Assessment Data . Student 
Assessment Services, Bureau of Education Information and Assessment Services, 
Department of Education, Tallahassee, Florida. (Unpublished) 

Traub, R. E. (1983). A Priori Considerations in Choosing an Item Response Model. In 

R. K. Hambleton (Ed. T Application of item Response Theory (pp. 57-70). Vancouver, 
British Columbia: Educational Research Institute of British Columbia. 

Wainer, H. «& Thissen, D. (1987). Estimating Ability with the Wrong Model. Journal of 
Educational Statistics Winter. 12, 4. pp.339-368. 

Way, W.D., Ansley, T.N., & Forsyth, R.A. (1988). The Comparative Effects of 

Compensatory and Noncompensatory Two-Dimensional Data on Unidimensional IRT 
Estimates. Applied Psychological Measurement . 12 . 3. pp. 239-252. 

Wilson, M & Iventosch, L. (1988). Using the Partial Credit Model to Investigate 

Responses to Structured Sub-Tests. Applied Measurement in Education . L 4. pp.319- 
334. 

Wright, B.D., Congdon, R., & Schultz, M. (1989). A user's guide to MSTEPS ('version2.4J . 
Chicago: MESA Psychometric Laboratory. 

Zeng, Lingjia. (1989). Robustness of Unidimensional Latent Trait Models When Applied 
to Multidimensional Data . Dissertation. University of Georgia, Athens, GA. 



41 




50 



APR-13-1999 09:20 FROM 3 



83014058134 P.01 

TM029708 




U.S. Dejmeni of Education 
Office of Educational Reirch and Improvement (OERl) 
National Librarf Education (NI£) 
Educational Resources hrmation Center (ERIC) 




ReproducfioiRelease 

(Specific Documeiu. 



I. PPCTM 

Robustness of the Polytomous IRT Model to Viol^*^s 

■iAu5ior(s); 

Bhaskar R. Dawadi, Ph.D. 



of the Unidimensionality Assi| 



ijCoiporate Source: 



*if^blication Date: 



n. REPRODUCTION RELEASE: . documents 

In order to disseminate as widely as possible timely and significant materials of interest to the educafi^.^ in 

announced in the monthly abstract journal of the ERIC ^stem, Resources in Education (RIH), are usua^ \ ' ' fEDRS) Credit is 
microfiche, reproduced paper copy, and electronic media, and sold through the ERIC Document Repro^?^^ ^the document 
given to the source of each document, and, if reproduction release is granted, one of the following notices f ® ocumen 

If permission is granted to reproduce and disseminate the identified document, please CHECK ONE of the options and 

sign in the indicated space following. 



y The sample sticker shown bolow will be affixed to aJI 
Level 1 documents 











Level 1 



I Chedchere for Level 1 release, permitting reproduction 
I and dissemination in microfiche or other ERIC archtvaJ 
media (e.g. electronic) a?rd paper copy, 



-hown bcloW will be affixed tO 2 lll 

The sample sticker shown below will be affixed to all . | The sample slicker sh 7 ^ documents 
Level 2A documents | LcVCi . 














"" ' 

ffixiarr" 




Lcv«l2B 






Check, here for I^el 2 A release, permitting |i 'iciionii 

reproduction and dissemination in microfiche and in ijChcck here tor Level 2B release, pennitting reprod*. 
electronic media for ERIC archival collection sub$criber$:i und ifisseminatioii m microfiche only 

only ’'*:j 



Documents will be processed as indicated provided reproduction quality pennies. 
Ifpeimssion to r^roducc is documents will bcjproccsscd at Ixvc! 1. 



yi hereby grant to the Educational Resources Information Center (ERIC) nonexclusive permission to reproduce and disseminate this 
'^ocumeni as indicated above. Reproduction from the ERIC microfiche, or electronic media by persons other than ERIC employees and 
lilts system contractors requires permission from the copyright holder. Exception is made for non-profit reproduction by libraries and 
"pther service agencies to satisfy information needs of educators in response to discrete inquiries. 



;|Si£pnatiire; 






iiOrganization/Address : 

Georgia Examining Boards 
1 66 Pryor Street, SW, #303 
Atlanta, GA 30303 



yPrimed Name/Tosilion/Tillc: 

i| Bhaskar R. Dawadi, Ph.D., Consultant 

: I Telephone: 



(404) 656-3903 



ilFax: 



(404) 657-6383 



O “ 

ERIC 



IjE^mall Address: HDale: 

bTd(uoa.d(@sos .ert«»V«..gft..us li 






TOTPL P.01 



